Outdoor THz fading modeling by means of gaussian and gamma mixture distributions

Terahertz (THz) band offers a vast amount of bandwidth and is envisioned to become a key enabler for a number of next generation wireless applications. In this direction, appropriate channel models, encapsulating the large and small-scale fading phenomena, need to be developed for both indoor and outdoor communications environments. The THz large-scale fading characteristics have been extensively investigated for both indoor and outdoor scenarios. The study of indoor THz small-scale fading has recently gained the momentum, while the small-scale fading of outdoor THz wireless channels has not yet been investigated. Motivated by this, this contribution introduces Gaussian mixture (GM) distribution as a suitable small-scale fading model for outdoor THz wireless links. In more detail, multiple outdoor THz wireless measurements recorded at different transceiver separation distance are fed to an expectation-maximization fitting algorithm, which returns the parameters of the GM probability density function. The fitting accuracy of the analytical GMs is evaluated in terms of the Kolmogorov-Smirnov, Kullback-Leibler (KL) and root-mean-square-error (RMSE) tests. The results reveal that as the number of mixtures increases the resulting analytical GMs perform a better fit to the empirical distributions. In addition, the KL and RMSE metrics indicate that the increase of mixtures beyond a particular number result to no significant improvement of the fitting accuracy. Finally, following the same approach as in the case of GM, we examine the suitability of mixture Gamma to capture the small-scale fading characteristics of the outdoor THz channels.

The indoor THz small-scale fading channel modeling has recently gained a momentum 7,16,17,20,[24][25][26][27][28][29][30][31] . Specifically, for the case of wireless backhaul THz links, the small-scale fading has been theoretically modeled by means of the α−µ distribution 24,26 . Then, the system performance has been quantified under different levels of transceiver hardware impairments, antennas misalignment and fading severity. Furthermore, the suitability of the α−µ distribution to describe the small-scale fading channel amplitude of indoor THz wireless channels has been experimentally validated in several studies 7,30,31 . Experimental LoS and NLoS THz wireless measurements have been performed in an anechoic chamber 27 . Based on this model, a stochastic indoor THz channel model has been developed, where the small-scale fading attenuation factor has been expressed in terms of a Rayleigh or Nakagami-m distribution under NLoS and as a Rice or Nakagami-m in LoS propagation conditions, respectively. A two dimensional stochastic geometric channel model has been developed for indoor THz wireless communications 28,29 . Then, a parametric multipath Rice fading model has been derived. A measurement based indoor channel model for the range of 126−156 GHz for both LoS and NLoS conditions has been developed 20 . The exponential pathloss and shadowing have been used to model the large scale fading, whereas the small-scale fading amplitude has been given by a novel distribution. Meanwhile, THz wireless measurements have been conducted within an anechoic chamber in the range of 240−300 GHz 25 . Then, by exploiting the measurements and various fitting accuracy metrics, it has been concluded that the small-scale fading amplitude of the links can be accurately modeled by means of the Gamma and Gaussian mixture models. Also, the mixture Gamma (MG) has been employed in investigating the capacity of a wireless channel and expressions for the optimal and power rate adaptation, the channel inversion with fixed and truncated rate were derived. The expressions were verified by means of Monte-Carlo simulations 32 . Furthermore, the Gamma mixture has been used for analytical performance assessment of composite fading channels in terms of received signal-to-noise-ratio 33 . In continuation of the previously mentioned work, the Gaussian mixture has been employed in the performance analysis of an energy detector. In more detail, analytical expressions for the performance parameters of average detection and area under the receiver probabilities were derived 34 .
The aforementioned contributions underline the importance of not only the large-scale, but also the smallscale fading THz channel modeling. However, to the best of the authors knowledge, results on THz small-scale fading channel modeling in outdoor environments have not been published so far. Motivated by this, in this work, outdoor THz measurements performed in the campus area of Aalto university in Finland are exploited. In more detail, multiple LoS and NLoS links have been measured at different transceiver separation distances. For each link, multiple channel gain measurements were recorded, which have been used to perform fitting analysis of the empirical channel gain distribution amplitude to Gaussian Mixtures (GMs) analytical distributions. The evaluation of the suitability of GMs to describe the small-scale fading channel gain amplitude of outdoor THz wireless links is very useful. An appropriate GM is capable of describing complicated fading scenarios, where multiple peaks can occur in the fading amplitude of the empirical distribution 25,35 . The GM is expressed as the sum of independent Gaussian distributions. Hence, it offers mathematical tractability, which is of great importance in analytical expressions evaluations. By taking this into account it should be noted that, the fluctuating-two-ray (FTR) model has also been employed in THz channel modeling 36,37 . However, the FTR uses an infinite number of components to approximate the empirical distribution. As a consequence simpler distributions like the GM and Gamma mixture are preferred to accommodate the channel modeling and the analytical evaluation needs. Moreover, it should be noted that, in this work the suitability of GMs to model the small-scale fading amplitude of the outdoor THz links is more thoroughly investigated in comparison with MGs distributions. The reason for this is that the analytical expression of the GMs are more tractable in comparison with those of the MGs and have been employed in various performance evaluation works 35,38,39 . Also, the suitability of MG to model the small-scale fading amplitude of short range indoor THz wireless links has been previously investigated 25 . Moreover, the support of a GM is defined in the (−∞, ∞) , which aids in achieving a good fit to the tails of the empirical distributions.
In this work the measurements of each link are preprocessed to obtain the channel gain of each of the recorded multipath components. Subsequently, in order to increase the number of the different channel realizations in each link, a method based on adding random phases to the path amplitudes will be employed. Then, by making use of the resulting channel realizations of each link, the empirical probability density function (PDF) and cumulative density function (CDF) are fitted to the analytical GMs. Also, MGs distributions are fitted to some indicative links and the fitting performance is compared to that of the GMs. Then, the parameters and weights www.nature.com/scientificreports/ of each Gaussian and Gamma distribution of a GM and MG expression, respectively, are obtained by fitting it to the empirical channel gain distribution of the investigated link. This is accomplished by means of the expectation maximization (EM) algorithm 25,35,38,40 . The accuracy of the fit of the analytical distributions to the corresponding empirical ones is quantified in terms of the Kolmogorov-Smirnov (KS), Kullback-Leibler (KL) and root-meansquare-error (RMSE) tests [41][42][43] . However, the evaluation of the fitting accuracy of the analytical GMs and MGs to the empirical ones is performed only in terms of the KL and RMSE tests, because, for all the GMs and MGs of all the investigated links, the KS yields a good fit. As a result, the KS poses as a non strict fitting criterion. According to the KL and RMSE metrics for all the links, it is observed that, as the number of mixtures increases the resulting analytical GMs and MGs perform a better fit to the empirical distributions. On the other hand, as the number of mixtures decreases, the resulting analytical GMs and MGs perform worse in terms of fitting even for single peak empirical distributions. Furthermore, the KL and RMSE metrics indicate that the increase of mixtures above a particular threshold does not improve drastically the fitting accuracy performance of the analytical GMs and MGs to the empirical ones. In order to further elucidate, the key contribution of this work lies in the approach that is followed to derive the empirical small-scale fading amplitude distribution of the investigated THz links. In more detail, the principle of transfer learning combined with the EM algorithm is employed for the measured data of an outdoor static THz propagation environment 44 . These THz wireless link measurement data contain deterministic pathloss measurements and during each link measurement session there were no moving scatterers. Yet, in a realistic THz wireless signal propagation scenario moving scatterers may influence the channel characteristics. This can be adequately modeled by the methodology initially proposed by Molisch et al. 45 . In this work this methodology is employed to populate the herein used link measurements datasets 44 . Next, after observing the resulting empirical PDF of each measured THz link, we propose the GM distribution as a suitable target distribution. In order to identify the number of Gaussian distributions needed and their corresponding weights and parameters we follow a fitting methodology based on an interactive EM algorithm.

Results
Measurement setup and sites. Figure 1 illustrates the top-view of the outdoor premises of Aalto University in Finland, where the THz measurements are conducted. In more detail, each link is defined by a unique transmitter (Tx) and receiver (Rx) pair. Both the Tx and Rx are equipped with a single antenna. During each measurement session both the Tx and Rx were in fixed positions, while only the Tx-Rx pair of interest was active, i.e., no interference is induced by neighbor links. Figures 1a and b show that individually Rx 1 and Rx 2 are employed to perform the wireless THz measurements. The Txs marked with green dots denote a LoS link between the Tx and the Rx of interest, whereas the Txs marked with a yellow dot stand for a NLoS transceiver link. However, it should be noted that for the investigated outdoor THz measurements no paths were able to be received in the NLoS transmissions scenarios. The THz transmissions of all the investigated links are performed at the center radio frequency (RF) of 142 GHz with a total bandwidth of 4 GHz 44 . The transmit power is set equal to 5 dBm and the transceivers antennas heights are 1.85 m . The Rx is equipped with a sectoral horn antenna with a gain of 19 dBi , whereas the Tx is equipped with an omni-directional antenna. Also, during the measurement of each Tx−Rx link, the Rx antenna is rotated with an angular step of 5 o and no moving objects are present.
Fitting of the gaussian & gamma mixtures to the channel gain measurements. In this section, the fading channels are approximated using the GM distribution. Also, some indicative fitting results of modeling the fading channels by means of the MG distribution are presented. In more detail, Figs. 2, 3, 4 and 5 serve as an illustrative example of the fitting achieved by the analytical GMs and MGs expressions, which are obtained as the weighted sum of K Gaussian and K Gamma distributions respectively, to the empirical channel gain measurements of the investigated links. Table 1 quantifies the fitting achieved by the GMs to the empirical measurements of the links in terms of the KL and RMSE fitting accuracy metrics. The link, d , KL, R and K columns stand for the TX−RX link index, the transceiver antennas separation distance, the achieved KL and RMSE metric         ; the KL and RMSE tests are reliable fitting accuracy tests not only for the GMs but also for the MGs distributions. Note that, for the interested reader the parameters of the GMs and MGs extracted in this work; can be found on the following link: https://github.com/T34gr/Gaussian-and-Gamma-mixture-distribution-parameters.git. Figure 2 illustrates the statistical characterization of the TX 1 −RX 1 and TX 28 −RX 2 links. In more detail, Fig. 2a shows the KL values of GMs and MGs with different K for both TX 1 −RX 1 and TX 28 −RX 2 . As expected, for a given link, as K increases, the KL value of the GMs generally decreases. After achieving a minimum KL value, as K further increases, a short variation towards this value is observed. According to Table 1, for both of the links the maximum KL value is achieved for K = 1 . Meanwhile, for K = 4 the first local minimum of KL is observed for the GMs of both TX 1 −RX 1 and TX 28 −RX 2 , which is equal to 0.037 and 0.123, respectively. For the TX 1 −RX 1 link, the global minimum value of KL is achieved for the GM with K = 11 , which can be found in Table 1. On the other hand, for TX 28 −RX 2 according to Table 1 the global minimum value of KL is achieved for the GM with K = 9 . For the case of MG modeling, from Fig. 2a it is observed that for both TX 1 −RX 1 and TX 28 −RX 2 as K increases the KL is reduced. Also, for both the links K = 1 yields the worst fit, where KL is 0.715 and 0.879, respectively. Furthermore, for TX 1 −RX 1 the KL results of the MGs tend to stabilize for K ≥ 15 and the best fit is achieved for K = 17 with KL = 0.019 . Also, it is observed that the KL results for both the MGs and GMs for the TX 1 −RX 1 link are similar for K ≥ 3 . For the TX 28 −RX 2 link the MG KL results stabilize for K ≥ 10 and the best fit is accomplished for K = 20 with KL = 0.087 . The KL values of the GMs in Table 1 and those of the MGs in Fig. 2a denote that for both links, the MG yields a better fit than the GM. However, as shown in Fig. 2e, both the examined mixture distributions achieve an accurate fit to the empirical channel gain measurements. Meanwhile, in Fig. 2b, the RMSE for different values of K of the GMs and MGs for both the TX 1 −RX 1 and TX 28 −RX 2 links is depicted. According to Table 1, for both of the aforementioned links the maximum RMSE value is achieved for K = 1 . Meanwhile, for both the TX 1 −RX 1 and TX 28 −RX 2 , the GM with K = 4 yields the minimum RMSE, which is reported in Table 1. Also, Fig. 2b shows that for both TX 1 −RX 1 and TX 28 −RX 2 the RMSE values of the MGs are lower compared to those of the GMs. In more detail, the MG with K = 17 yields the best fit to the   Figure 2c and d serve as an illustrative example of the fitting achieved by the analytical GM expressions with different K to the empirical channel gain PDFs and CDFs for the links TX 1 −RX 1 and TX 28 −RX 2 , respectively. Specifically, the blue circles represent the empirical channel gain distributions of the investigated links, while the continuous and dashed lines stand for the fitted GMs of different K for the links TX 1 −RX 1 and TX 28 −RX 2 , respectively. Note that, unless otherwise is stated, the continuous and dashed lines of the same color denote GMs with the same K. By taking into account the KL and RMSE values of Table 1 and by examining the fitting of the PDFs and CDFs of the GMs to the empirical channel gain distributions of Fig. 2c and d, it can be ascertained that the increase of K leads to analytical GM expressions that better fit the empirical ones. Fig. 2e and f illustrate the fitting achieved by the analytical PDFs and CDFs of the GMs and MGs with different K to the empirical channel gain measurements of TX 1 −RX 1 and TX 28 −RX 2 links. In these figures, the blue circles represent the empirical channel gain PDFs and CDFs of the links. The continuous and dashed red and green lines stand for the GMs with K equal to 4 and 11, which denote the best fitting GMs to the empirical distributions according to the RMSE and KL metrics of the TX 1 −RX 1 and TX 28 −RX 2 links, respectively. Moreover, the red crosses indicate the MGs that yield the best fit to the empirical distributions according to both the metrics. In more detail, the MG with K equal to K Ŵ = 17 is the one that yields the best fit to the empirical distribution of TX 1 −RX 1 , whereas the MG with K equal to K Ŵ = 20 is the one that yields the best fit to the empirical distribution of TX 28 −RX 2 . Figure 3 depicts the statistical characterization of the TX 4 −RX 1 and TX 16 −RX 1 links. In Fig. 3a, the KL values of GMs and MGs with different K for both TX 4 −RX 1 and TX 16 −RX 1 are presented. For the case of GM modeling, it is observed that, for both TX 4 −RX 1 and TX 16 −RX 1 as K increases KL is reduced. Up to K = 7 , KL presents a significant variation for both links. However, for K ∈ [8,20] , the resulting KL values stabilize. From Table 1, the minimum KL value for both the TX 4 −RX 1 and TX 16 −RX 1 links corresponds to a GM with K = 15 , whereas K = 1 leads to the worst fit. For the case of MG modeling, it is observed that, for both TX 4 −RX 1 and TX 16 −RX 1 as K increases KL is reduced. For TX 4 −RX 1 , up to K = 5 , KL shows a significant variation, whereas for K ∈ [6, 20] , the KL values stabilize. Also, according to the KL metric the MG that performs the best 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Fig. 3b, it observed that for the TX 4 −RX 1 link the RMSE results of the MGs vary significantly for K ≤ 8 and improve with the increase of K. Meanwhile, based on the RMSE metric the MG with K = 1 yields the worst fit, whereas the best fit is achieved for K = 19 with R = −22.87 . Furthermore, the RMSE metric results shown in Table 1 for the GMs and Fig. 3b demonstrate the better fitting accuracy of MGs compared to GMs for the empirical channel gain distribution of TX 4 −RX 1 . For the TX 16 −RX 1 link as Fig. 3b illustrates the RMSE values of the MGs tend to stabilize for K ≥ 11 . The best fit for the link according to the RMSE is achieved for K = 20 with R = −20.33 dB , whereas the worst for K = 1 with R = −8.57 dB . Also, according to the RMSE values of the GMs for the TX 16 −RX 1 link of Table 1 and Fig. 3b, the MGs yield a better fit to the empirical channel gain measurements of this link. Fig. 3c and d present the fitting accomplished by the analytical PDFs and CDFs of GMs with different K to the empirical channel gain distributions of the links TX 4 −RX 1 and TX 16 −RX 1 . The blue circles represent the empirical channel gain distributions of the investigated links, while the continuous and dashed lines stand for the fitted analytical GMs of for TX 4 −RX 1 and TX 16 −RX 1 , respectively. By taking into account the KL and RMSE values of Table 1 and by observing Fig. 2c and d, it can be ascertained that the increase of K leads to analytical GM expressions with improved fit to the empirical PDF and CDF. Moreover, it is obvious that a single Gaussian distribution (i.e. K = 1 ) can not accurately describe the empirical data. Figures 3e and f illustrate the fitting achieved by the analytical PDFs and CDFs of the GMs and MGs with different K to the empirical channel gain distributions of the TX 4 −RX 1 and TX 16 −RX 1 links. In these figures, the blue circles stand for the empirical channel gain PDFs and CDFs of the TX 4 −RX 1 and TX 16 −RX 1 . The continuous red and green lines denote the best fit achieved by the analytical GM to the empirical data of TX 4 −RX 1 according to the RMSE and KL metrics, respectively, while the corresponding dashed lines denote the best fitting GM curves to TX 16 −RX 1 . Meanwhile, the curves marked with the red crosses and cyan dots indicate the analytical MGs that yield the best according to the RMSE and KL metrics to the empirical distribution of TX 4 −RX 1 link with K equal to K Ŵ = 19 and K Ŵ = 20 , respectively, while the red crosses with K Ŵ = 20 denote the MG that yields the best fit to TX 16 −RX 1 according to both metrics. Figures 3e and f illustrate that both the GMs and MGs can yield a good fit to the data and can be both considered for the THz small-scale fading channel modeling. Figure 4 presents the statistical characterization of TX 25 −RX 2 link. In more detail, Fig. 4a shows the KL achieved by GMs and MGs with different K. For the GM it is observed that as K increases the KL improves. The value of K = 5 yields KL = 0.151 , which is the first local minimum. Meanwhile, for K ≥ 9 the KL stabilizes to almost the optimum value. For example, GMs with K = 9 , 14, 18, and 20 result to KL = 0.131 , 0.117, 0.106, and 0.112, respectively. Meanwhile, according to Table 1, K = 2 yields the maximum value of KL and hence the worst fit. Moreover, from Fig. 4a it is observed that the MGs have similar performance with the GMs in terms of fitting when the KL metric is employed. The best fit of the MG is achieved for K = 20 , where KL = 0.016 . The similar fitting performance of GM and MG can also be observed in Fig. 4c and d. In Fig. 4b the RMSE for GMs and MGs with different K is presented. In more detail, for the GMs the first local minimum is obtained for K = 4 and is R = −11.18 dB , while the second local minimum results for K = 5 and is R − 13.1 dB . Moreover, for K ≥ 10 the RMSE almost stabilizes to the optimum value. For example, the GMs with K = 10 , 12, and 20 yield R = −13.47 , −13.48 , and −13.4 dB , respectively. Similar observations for the RMSE results of the MGs can be extracted as those for the GMs. However, according to this metric the MGs perform significantly better in terms of fitting for K ≥ 13 . The best fit is accomplished for the MG with K = 20 , where R = −16.28 dB.
In Fig. 4c and d the fitting achieved by the analytical PDF and CDF GM and MG expressions with different values of K to the empirical channel gain distribution of TX 25 −RX 2 are presented. In more detail, the blue circles stand for the empirical distribution of the investigated link, whereas the continuous red, green and magenta lines indicate the GM with K equal to 4, 12 and 20, respectively. Also, the dashed black lines denote the analytical MG expressions obtained for K equal to K Ŵ = 20 , which denotes the best fitting MG based on both metrics. Figure 4c and d illustrate that the best fit to the empirical data is accomplished by the GM with K = 18 , which is in accordance with the KL metric results. Also, it can be conducted that, in the case an empirical PDF with multiple peaks the increase of K, leads to a GM with a higher fitting accuracy performance. In this sense, the GM with K = 4 performs the worst fit. As an example, for K = 4 the metrics are KL = 0.233 and R = −11.18 dB. Figure 5 presents the statistical characterization of TX 17 −RX 1 link. In more detail, Fig. 5a shows the KL achieved by GMs and MGs with different K. It is observed that, for the GMs as K increases the KL improves. In more detail, the GM with K = 5 yields KL = 0.139 , which is the local minimum of the KL metric. Meanwhile, for K ≥ 11 the KL results are almost equal. For example, for K = 11 , 15 and 20 the resulting KL is equal to 0.029, 0.03, and 0.026, respectively. Furthermore, based on Table 1, the GM with K = 2 performs the worst fit in terms of the KL metric. Meanwhile, from Fig. 5a it is similarly observed that the increase of K improves the fitting accuracy of the MGs to the empirical channel gain data. Also, the KL results tend to stabilize for K ≥ 10 and the best fit for the MG is accomplished for K = 11 , where KL = 0.181 . It should be noted that, according to the GMs KL metric of Table 1 the GM performs a better fit to the empirical data when compared to MG in terms of the KL metric. This significant difference is illustrated in Fig. 5c. In Figure 5b the RMSE for GMs and MGs with different K is presented. It is observed that, for the GMs the first and second RMSE local minima are R = −14.77 and −17.89 dB , which are obtained for a GM with K = 5 and 9, respectively. The minimum RMSE according to Table 1 is accomplished for the GM with K = 20 . Meanwhile, as Fig. 5b illustrates as K increases the RMSE of the MGs improves and then deteriorates. This indicates that for the TX 17 −RX 1 link increasing the number of Gamma mixtures does not improve the fitting performance. The best fit in terms of the RMSE metric for the MG is achieved for K = 11 , where R = −12.85 dB . Both the KL and RMSE metrics shown in Fig. 5a and b denote that for an empirical distribution with multiple peaks the GM can yield a better fit in comparison with the MG. Figure 5c and  www.nature.com/scientificreports/ distributions, whereas the continuous red, green and magenta lines indicate the analytical PDFs and CDFs of the GMs with K = 4 , 12 and 20, respectively. Moreover, the dashed black lines stand for the MG obtained with K equal to K Ŵ = 11 , which denotes the best fitting MG based on both metrics. Figure 5c and d demonstrate that the GM with K = 20 yields the best fit. This can be verified by the KL and RMSE metric results of Table 1. Furthermore, it can be concluded that, in order to analytically describe an empirical distribution presenting multiple peaks a GM with a greater K is needed. As Figure 5c demonstrates the GM with K = 4 achieves the worst fit to the empirical data.

Discussion
The majority of the THz small-scale fading channel modeling works employ analytical distributions, such as Nakagami-m, Rayleigh, Rice, α−µ , and Weibull 7,27-29 . However, these distributions are capable of only describing single-peak fading channels. In this work, the suitability of modeling single and multiple peaks PDFs of outdoor THz channels in terms of GMs is investigated. Also, MGs are fitted to the empirical channel gain measurements of some indicative links. It is observed that, for both the cases of single and multiple peaks, empirical channel gain distributions the increase of K yields GMs and MGs that better fit the data. Accordingly, this is verified by the results of the KL and RMSE fitting accuracy metrics. In more detail, for all of the investigated links, for the lower values of K, the KL and RMSE fitting accuracy performance deteriorates. For most of the links, low values of K tend to yield significant variations to the KL and RMSE. On the other hand, for all of the examined links, as K increases beyond a specific value, the KL and RMSE fitting accuracy results tend to stabilize. This elucidates that, for any given link, the best fit is accomplished by a GM or MG with a particular value of K or higher. Hence, further increasing K is expected to make only a slight difference on the fitting performance of the GMs to the empirical distributions. Moreover, from the analytical GM and MG distributions illustrated in Figs. 2, 3, 4 and 5 and according to equations (4) and (7) the defining parameter for an analytical GM or MG to present significant peaks is the weight parameter w of its Gaussian or Gamma distribution coefficients. In more detail, as an example, for analytical GM distributions such as those presented in Figs. 2(c) and 3(c) for each K the differences of the w parameters are not significant. On the other hand, for analytical GM distributions such as those that are shown in Figs. 4(c) and 5(c) , especially by increasing K there are w values that are greater compared to the rest. As a result, the corresponding Gaussian distribution coefficient with such a w, is more prominent in defining the peak amplitudes of the total GM. To demonstrate this, Table 2 presents the w parameter values for the Tx 1 −Rx 1 and Tx 25 −Rx 2 links. Moreover, the fitting accuracy statistics for the MGs employed in this work, verified that the MGs can model the small-scale fading amplitude of THz links. By comparing the fitting accuracy of the MGs and GMs for some indicative TX-RX links, it is observed that they both achieve a good fit to the empirical channel gain measurements. This observation verifies the previous technical works, where both the GMs and MGs were found suitable for THz channel modeling 25 . Meanwhile, the MG yields a better fit than the GM for the majority of the investigated links. However, the fitting accuracy of the GM is superior than that of the MG for links with multiple peaks with severe changes of amplitude. In more detail, as Fig. 5c illustrates and based on the KL and  www.nature.com/scientificreports/ RMSE fitting accuracy tests, the GM yields an accurate fit with K = 20 to the empirical PDF of TX 17 −RX 1 . On the other hand, the MG fails to yield a good fit to the data for K ≤ 20 , where according to both the metrics the best fit of the MG is accomplished for K = 11 . As a consequence, the resulting analytical PDF and CDF MG expressions do not fit at all the empirical ones of the TX 17 −RX 1 link. Finally, as a future work we intend to use more outdoor THz wireless measurements and compare the fitting achieved by Gaussian and Gamma mixtures to the empirical channel distributions. The THz free space pathloss even at distances of a few meters and a low transmission frequency can be severe.

Methods
As an example, for an operational frequency of 140 GHz and a communication distance of 1 m the free space pathloss can be in the excess of 80 dB 17,47 . Moreover, the atmospheric water vapor causes severe attenuation to the propagating THz signal 7,15 . Also, the wavelength of the emitted THz signal can be much smaller compared to the size of obstacles laid within the propagation environment 48 . As a consequence, the refraction and reflection losses of the THz band are significantly stronger, when compared to lower frequency bands 46,[49][50][51] . This leads to a significant reduction of the number of dominant rays, since the THz signal power is drastically weakened, when it is reflected or scattered two or more times 48,49 . In this sense, the ability of the THz electromagnetic wave to propagate through blockages is nearly lost, due to the severe penetration loss. As a result, the ability of THz signals to diffract around obstacles is significantly reduced. The aforementioned remarks elucidate that, the THz band yields non-rich multipath environments, when compared for example to the mmWave band. However, still there are surfaces that can act as scatterers for propagating wireless THz signals 16,17,20,29,46 . This leads to the existence of reflected NLoS multipath components carrying a significant amount of power, which are capable of being detected by the Rx. Nevertheless, the amount of measured multipath components, utilized in our analysis, is still not adequately enough to perform small-scale fading statistics analysis for a THz wireless channel. This limitation is surpassed by generating different realizations of the transfer function. This is accomplished by changing the phases of the measured multipath components of a link 7,45 . The random phases are assumed to be stochastic and are given by a uniform distribution in the interval [0, 2π] . This assumption is based on the contribution of Molisch et. al, which was based on the principle that the aggregated phases of different paths in an environment of moving scatterers followed a uniform distribution 45 . Hence, from the electromagnetic theory point of view, this is extracted by taking into account the phase shift due to the Doppler effect and it stands in any propagation environment where motion is present. The channel coefficient of the system can be obtained as 7,45 where ψ i ∼ U(0, 2 π) represents the random phase of the i-th multipath component. Moreover, by assuming that the amplitude of the channel coefficients does not change dramatically among the progressing t i , i.e., the channel can be considered as flat-fading then, t i = 0 45 . Also, the term U(·, ·) is the uniform distribution operator 52 .
Expectation-maximization based fitting approach. The gaussian and gamma mixture models. The THz small-scale fading phenomenon has been the epicenter of many recent channel modeling studies 7,25,27,51 . Moreover, it has been experimentally observed that there are wireless THz propagation scenarios, where the small-scale fading channel amplitude shows significant fluctuations 25 . In this sense, the commonly used ana-  25,35,38 . The PDF of the GM is defined as where K and w i denote the number of GM components and the weight of the i-th mixture component, respectively. The parameters µ i and σ i stand for the mean and standard deviation of the i-th GM component, respectively. Also, w i ∈ [0, 1] and The CDF of the GM is expressed as where Erfc(·) is the complementary error function 41 . Moreover, of note is the fact that the K Gaussian distributions that comprise equation (4) are mutually independent. Hence, the GM is not only a favorable distribution for modeling significant empirical distribution amplitude fluctuations, but also it can offer analytical tractability. The latter is of great importance, when the performance analysis of a wireless system must evaluated. Also, it should be noted that since this work employs pathloss measurements the x instance of a GM is always non-negative, hence for the PDF of equation (4) x ∈ [0, ∞).
The MGs have been employed in various channel modeling works in lower frequency bands and the THz band as well 25,54,55 . The PDF of the MG is defined as where a i and b i stand for the shape and scale parameters of the i-th MG component. Also, according to the definition of equation (7) x ∈ [0, ∞) and the operator Ŵ(·) denotes the gamma function 41 . The CDF of the MG is defined as where γ (·, ·) stands for the lower incomplete gamma function 41 .
The expectation maximization algorithm. The weights and the parameters of the Gaussian distributions that compose the GM with the best possible fit to the empirical data must be identified by employing an appropriate method. The EM algorithm is such a method. The EM is a machine learning approach that simplifies maximum-likelihood-estimate (MLE) problems and is vastly used in calculating the parameters of mixture models 25,35 .
The EM is a two step algorithm. It consists of the expectation (E) and the maximization (M) steps 40 . To operate the EM algorithm, the K number of mixtures and the vector y = y 1 , ..., y n of the n channel gain measurements of a link are required as inputs. Subsequently, the mixtures parameters are updated at the M-step during the m + 1 iteration of the EM algorithm until the convergence criterion is met. Otherwise the EM terminates, when a predefined number of repetitions is reached. The converge criterion is defined as where ε stands for the desired convergence value. The term L [m] signifies the MLE log-likelihood at the m-th iteration of the EM algorithm and can be obtained as where j ∈ [1, K] , i ∈ [1, n] and ln(·) stands for the natural logarithm. The term φ y i µ j and σ [m] j , respectively. Meanwhile, the E-step of the EM is implemented as www.nature.com/scientificreports/ Uppon the completion of the E-step, the EM algorithm implements the M-step. The M-step provides the updated values of the distribution parameters of the j-th mixture at the m + 1 step of the algorithm, which for the particular case of a GM are calculated as in equations (12)- (14) The convergence of the EM algorithm depends on K and the initialization values of the mixtures parameters that are provided as inputs. Several methods are available to provide initialization values for the mixtures parameters. One of the most common is to employ the K-nearest-neighbour (KNN) algorithm 56 .
Evaluation of the fitting. The kolmogorov-smirnov test. The KS goodness of fit test is defined as 41 where F emp (x) and N stand for the empirical values of the channel gain CDF of the examined link and the number of discrete samples of F emp (x) , respectively. The parameter F gm (x) denotes the analytical CDF of the examined analytical distribution, while A = 5% is the selected significance level.
Kullback-leibler divergence test. The KL divergence test is defined as the distance between the empirical PDF f emp (x) and the analytical PDF f gm (x) of the examined distribution i.e., 42 The closer the value of equation (16) to 0 the better is the fit of the analytical fading distribution to the empirical channel gain distribution.
The root mean square error. The RMSE is defined as 43 The lower the value of R the better the fit of the analytical f gm (x) PDF to the empirical distribution. Also, it should be noted that the RMSE results are commonly presented in dB scale.

Data availability
The initial data are owned by Aalto University Finland. Any researcher affiliated to one of the ARIADNE project partners is allowed to access and use the shared data for research purposes. The shared data must however not be made accessible to any person not affiliated with an ARIADNE project partner. All the processed data present in this work accompany this manuscript as a supplementary material.